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REMARKS 

The Applicants respectfully request further examination and reconsideration in view of 
the comments set forth fully below. Claims 1-51 were pending. Within the Office Action, 
Claims 1-5 1 have been rejected. By the above amendments, Claim 41 has been amended. 
Accordingly, Claims 1-51 are now pending in the application. 

Objections to the Claims: 

Within the Office Action, Claim 41 has been objected to for the punctuation missing from 
the claim. By the above amendments, Claim 41 has been amended to include additional 
punctuation and spacing for clarity. Accordingly, the objection to the claim should be 
withdrawn. 

Rejections Under 35 U.S.C. § 102: 

Within the Office Action, Claims 1, 2, 7, 8, 1 1, 13-16, 41, 44, 50 and 51 have been 
rejected under 35 U.S.C. § 102(a) as being anticipated by A Bit-Serial VLSI Array Processing 
Chip for Image Processing, IEEE Journal of Solid-State Circuits, Vol. 25, No. 2, April 1990 to 
Heaton et al. (hereinafter "Heaton"). The applicants respectfully disagree. 

Heaton teaches an array processing chip which integrates many processing elements on a 
single die. Each processing element has several components including a 16-function logical unit, 
an adder, a shift register and local RAM. [Heaton, Abstract] Heaton also teaches: 

An OR tree is connected to all PE's on the chip, enabling the values 
presented on each of the 128 PE data buses to be ORed together. This feature 
enables the user to quickly test for a "true" bit in any of the PE's of the array. The 
OR tree is useful in associative operations and in performing data searches. OR 
tree operations are pipelined. The single OR pin output is open drain, enabling 
several BLHZEN chips to be directly wire ORed together. [Heaton, Page 367, 2 nd 
Paragraph] 

However, Heaton does not teach a global accumulation unit to accumulate the results of the 
processing operations for each processing element. 

In contrast to Heaton, the present invention is directed to a video platform architecture for 
video processing includes complex video compression/decompression algorithms in a computer 
with a two-dimensional Single-Instruction Multiple-Data (SIMD) array architecture. The video 
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platform architecture includes one or more video processing modules, on-chip shared memory, 
and a general-purpose RISC central processing unit CPU used as a system controller. Each video 
processing module includes a rectangular array of processing elements (PEs), a block load/store 
unit and a global-accumulation unit. Video to be processed is configured into blocks of data and 
a general-purpose CPU used as a local controller. A plurality of registers are provided in the 
processing elements and the block load/store unit to support two-dimensional processing of the 
data blocks. Types of registers used include block registers, vector registers, scalar registers, and 
exchange registers. Each of these registers is designed to hold a short ordered one- or two- 
dimensional set of video data (data blocks). These registers are arranged in a hierarchical 
configuration along the data flow path between the on-chip memory and processing units within 
the PE array. [Present Specification, Abstract] 

Furthermore, in some embodiments, the global accumulation unit includes 4 slice 
accumulation (SACC) registers, 1 global PE mask control register, and 1 global accumulation 
(GACC) register. In some embodiments, there is one SACC register for each vertical PE slice of 
the PE array. The SACC registers are the intermediate registers in the operations moving data 
from the LACC register of each PE to the GACC register. In some embodiments, there are 4 40- 
bit SACC registers in the global accumulation unit. Each of the SACC registers includes three 
individually written sections, namely low 16-bits, middle 16-bits, and high 8-bits. Each PE's 40- 
bit LACC is read in steps, specifying which part of the LACC, low 16-bits, middle 16-bits, or 
high 8-bits, is to be placed on the 16-bit bus to the global accumulation unit, and finally into 
corresponding section of the appropriate SACC register. During operation of the global 
accumulation unit, either the full 40-bit values or packed 20-bit values of the SACC register 
involved in the accumulation operations are added together by a global add instruction and a 
global add and accumulate instruction. The GACC register is used to perform global 
accumulation of LACC values from multiple PEs loaded into the corresponding SACC registers. 
In some embodiments, there is one 48-bit GACC register in the global accumulation unit. 
[Present Specification, page 15, line 24 through page 16, line 8] As described above, Heaton 
does not teach a global accumulation unit to accumulate the results of the processing operations 
for each processing element. 

The independent Claim 1 is directed to a video processing apparatus. The video 
processing apparatus of Claim 1 comprises a memory and one or more video processing 
modules, each video processing module coupled to the memory and comprising a programmable 
array of processing elements, each processing element including local registers to provide data 
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used in processing operations and to store results of the processing operations, a block load and 
store unit coupled to the programmable array of processing elements to load, store, and send data 
transferred back and forth between the memory and the array of processing elements, a global 
accumulation unit to accumulate the results of the processing operations for each processing 
element and a local controller to provide instructions and parameters related to the processing 
operations and data transfer. As described above, Heaton does not teach a global accumulation 
unit to accumulate the results of the processing operations for each processing element. For at 
least these reasons, the independent Claim 1 is allowable over the teachings of Heaton. 

Claims 2, 7, 8, 1 1 and 13-16 are all dependent upon the independent Claim 1. As 
described above, the independent Claim 1 is allowable over the teachings of Heaton. 
Accordingly, Claims 2, 7, 8, 1 1 and 13-16 are all also allowable as dependent upon an allowable 
base claim. 

The independent Claim 41 is directed to a programmable array of processing elements to 
process video. Each processing element comprises local registers to store video data blocks 
received from a main memory, to process the received video data blocks, and to store results of 
processing the video data blocks, wherein each processing element is configured to send the 
results to a global accumulation unit to accumulate the results of the processing operations for 
each processing element. As described above, Heaton does not teach wherein each processing 
element is configured to send the results to a global accumulation unit to accumulate the results 
of the processing operations for each processing element. For at least these reasons, the 
independent Claim 41 is allowable over the teachings of Heaton. 

Claims 44, 50 and 51 are all dependent upon the independent Claim 41. As described 
above, the independent Claim 41 is allowable over the teachings of Heaton. Accordingly, Claims 
44, 50 and 51 are all also allowable as dependent upon an allowable base claim. 

Rejections Under 35 U.S.C. § 103: 

Within the Office Action, Claims 3, 4, 9, 10, 42, 43, 45 and 46 have been rejected under 
35 U.S.C. § 103(a) as being unpatentable over Heaton in view of U.S. Patent No. 4,992,933 to 
Taylor (hereinafter "Taylor"). The applicants respectfully disagree. 

Claims 3, 4, 9 and 10 are all dependent upon the independent Claim 1. As described 
above, the independent Claim 1 is allowable over the teachings of Heaton. Accordingly, Claims 
3, 4, 9 and 10 are all also allowable as dependent upon an allowable base claim. 
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Claims 42, 43, 45 and 46 are all dependent upon the independent Claim 41. As described 
above, the independent Claim 41 is allowable over the teachings of Heaton. Accordingly, Claims 
42, 43, 45 and 46 are all also allowable as dependent upon an allowable base claim. 

Within the Office Action, Claims 12 and 49 have been rejected under 35 U.S.C. § 103(a) 
as being unpatentable over Heaton in view of U.S. Patent No. 4,745,547 to Buchholz (hereinafter 
"Buchholz"). The applicants respectfully disagree. 

Claim 12 is dependent upon the independent Claim 1 . Claim 49 is dependent upon the 
independent Claim 41. As described above, the independent Claims 1 and 41 are allowable over 
the teachings of Heaton. Accordingly, Claims 12 and 49 are both also allowable as dependent 
upon an allowable base claim. 

Within the Office Action, Claims 5, 6, 15, 47 and 48 have been rejected under 35 U.S.C. 
§ 103(a) as being unpatentable over Heaton in view of U.S. Patent No. 5,680,338 to Agarwal et 
al. (hereinafter "Agarwal"). The applicants respectfully disagree. 

Claims 5, 6 and 15 arc all dependent upon the independent Claim 1. As described above, 
the independent Claim 1 is allowable over the teachings of Heaton. Accordingly, Claims 5, 6 
and 15 are all also allowable as dependent upon an allowable base claim. 

Claims 47 and 48 are dependent upon the independent Claim 41 . As described above, the 
independent Claim 41 is allowable over the teachings of Heaton. Accordingly, Claims 47 and 48 
are both also allowable as dependent upon an allowable base claim. 

Within the Office Action, Claims 17-26, 28-38 and 40 have been rejected under 35 
U.S.C. § 103(a) as being unpatentable over Heaton in view of Taylor and further in view of 
Agarwal. The applicants respectfully disagree. 

As described above, Heaton teaches an array processing chip which integrates many 
processing elements on a single die. Each processing element has several components including 
a 16-function logical unit, an adder, a shift register and local RAM. [Heaton, Abstract] Heaton 
also teaches: 

An OR tree is connected to all PE's on the chip, enabling the values 
presented on each of the 128 PE data buses to be ORed together. This feature 
enables the user to quickly test for a "true" bit in any of the PE's of the array. The 
OR tree is useful in associative operations and in performing data searches. OR 
tree operations are pipelined. The single OR pin output is open drain, enabling 
several BLHZEN chips to be directly wire ORed together. [Heaton, Page 367, 2 nd 
Paragraph] 
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However, Heaton does not teach a global accumulation unit to accumulate the results of the 
processing operations for each processing element. 

Furthermore, Taylor and Agarwal do not teach a global accumulation unit to accumulate 
the results of the processing operations for each processing element. Thus, the combination of 
Heaton, Taylor and Agarwal does not teach a global accumulation unit to accumulate the results 
of the processing operations for each processing element. 

In contrast to Heaton, Taylor, Agarwal and their combination, the present invention is 
directed to a video platform architecture for video processing includes complex video 
compression/decompression algorithms in a computer with a two-dimensional Single-Instruction 
Multiple-Data (SIMD) array architecture. The video platform architecture includes one or more 
video processing modules, on-chip shared memory, and a general-purpose RISC central 
processing unit CPU used as a system controller. Each video processing module includes a 
rectangular array of processing elements (PEs), a block load/store unit and a global-accumulation 
unit. Video to be processed is configured into blocks of data and a general-purpose CPU used as 
a local controller. A plurality of registers are provided in the processing elements and the block 
load/store unit to support two-dimensional processing of the data blocks. Types of registers used 
include block registers, vector registers, scalar registers, and exchange registers. Each of these 
registers is designed to hold a short ordered one- or two-dimensional set of video data (data 
blocks). These registers are arranged in a hierarchical configuration along the data flow path 
between the on-chip memory and processing units within the PE array. [Present Specification, 
Abstract] 

Furthermore, in some embodiments, the global accumulation unit includes 4 slice 
accumulation (SACC) registers, 1 global PE mask control register, and 1 global accumulation 
(GACC) register. In some embodiments, there is one SACC register for each vertical PE slice of 
the PE array. The SACC registers are the intermediate registers in the operations moving data 
from the LACC register of each PE to the GACC register. In some embodiments, there are 4 40- 
bit SACC registers in the global accumulation unit. Each of the SACC registers includes three 
individually written sections, namely low 16-bits, middle 16-bits, and high 8-bits. Each PE's 40- 
bit LACC is read in steps, specifying which part of the LACC, low 16-bits, middle 16-bits, or 
high 8-bits, is to be placed on the 16-bit bus to the global accumulation unit, and finally into 
corresponding section of the appropriate SACC register. During operation of the global 
accumulation unit, either the full 40-bit values or packed 20-bit values of the SACC register 
involved in the accumulation operations are added together by a global add instruction and a 
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global add and accumulate instruction. The GACC register is used to perform global 
accumulation of LACC values from multiple PEs loaded into the corresponding SACC registers. 
In some embodiments, there is one 48-bit GACC register in the global accumulation unit. 
[Present Specification, page 15, line 24 through page 16, line 8] As described above, Heaton, 
Taylor, Agarwal and their combination do not teach a global accumulation unit to accumulate the 
results of the processing operations for each processing element. Heaton, Taylor, Agarwal and 
their combination also do not teach accumulating the results stored in the local accumulators in a 
global accumulator, thereby forming accumulated results. 

The independent Claim 17 is directed to a method of processing video. The method of 
Claim 17 comprises configuring a video stream into data blocks, loading data blocks from 
memory to a first array of exchange registers, loading data blocks from the first array of exchange 
registers to a programmable array of processing elements, wherein each processing element 
within the array of processing elements includes an array of block registers, an array of vector 
registers, and a local accumulator, the data blocks are loaded from the first array of exchange 
registers to the array of block registers, loading the data blocks from the array of block registers 
to the array of vector registers, processing the data blocks loaded in the array of vector registers 
and storing results in the corresponding local accumulator for each processing element, 
accumulating the results stored in the local accumulators in a global accumulator, thereby 
forming accumulated results and moving the accumulated results into a local controller. As 
described above, Heaton, Taylor, Agarwal and their combination do not teach accumulating the 
results stored in the local accumulators in a global accumulator, thereby forming accumulated 
results. For at least these reasons, the independent Claim 17 is allowable over the teachings of 
Heaton, Taylor, Agarwal and their combination. 

Claims 18-26 and 28 are all dependent upon the independent Claim 17. As described 
above, the independent Claim 17 is allowable over the teachings of Heaton, Taylor, Agarwal and 
their combination. Accordingly, Claims 18-26 and 28 are all also allowable as dependent upon 
an allowable base claim. 

The independent Claim 29 is directed to a video processing apparatus. The video 
processing apparatus of Claim 29 comprises means for configuring a video stream into data 
blocks, means for loading data blocks from memory to a first array of exchange registers, the 
means for loading data blocks from memory coupled to the means for configuring, means for 
loading data blocks from the first array of exchange registers to a programmable array of 
processing elements, the means for loading data blocks from the first array of exchange registers 

-15- 



PATENT 

Attorney Docket No.: SONY-27300 

coupled to the means for loading data blocks from memory, wherein each processing element 
within the array of processing elements includes an array of block registers and an array of vector 
registers, the data blocks are loaded from the first array of exchange registers to the array of 
block registers, means for loading the data blocks from the array of block registers to the array of 
vector registers, the means for loading the data blocks from the array of block registers coupled 
to the means for loading data blocks from the first array of exchange registers, means for 
processing the data blocks loaded in the array of vector registers and storing results in the 
corresponding local accumulator for each processing element, the means for processing coupled 
to the means for loading the data blocks from the array of block registers, means for 
accumulating the results stored in the local accumulators in a global accumulator, thereby 
forming accumulated results, the means for accumulating coupled to the means for processing 
and means for moving the accumulated results into a local controller, the means for moving 
coupled to the means for accumulating. As described above, Heaton, Taylor, Agarwal and their 
combination do not teach means for accumulating the results stored in the local accumulators in a 
global accumulator, thereby forming accumulated results. For at least these reasons, the 
independent Claim 29 is allowable over the teachings of Heaton, Taylor, Agarwal and their 
combination. 

Claims 30-38 and 40 are all dependent upon the independent Claim 29. As described 
above, the independent Claim 29 is allowable over the teachings of Heaton, Taylor, Agarwal and 
their combination. Accordingly, Claims 30-38 and 40 are all also allowable as dependent upon 
an allowable base claim. 

Within the Office Action, Claims 27 and 39 have been rejected under 35 U.S.C. § 103(a) 
as being unpatentable over Heaton in view of Taylor in view of Agarwal and further in view of 
Buchholz. The applicants respectfully disagree. 

Claim 27 is dependent upon the independent Claim 17. Claim 39 is dependent upon the 
independent Claim 29. As described above, the independent Claims 17 and 39 are allowable 
over the teachings of Heaton, Taylor, Agarwal and their combination. Accordingly, Claims 27 
and 39 are both also allowable as dependent upon an allowable base claim. 



- 16- 



PATENT 

Attorney Docket No.: SONY-27300 

Applicants respectfully submit that the claims are in a condition for allowance, and 
allowance at an early date would be appreciated. Should the Examiner have any questions or 
comments, they are encouraged to call the undersigned at (408) 530-9700 to discuss the same so 
that any outstanding issues can be expeditiously resolved. 

Respectfully submitted, 
HAVERSTOCK & OWENS LLP 



Dated: March 13. 2008 By: /Jonathan O. Owens/ 

Jonathan O. Owens 
Reg. No.: 37,902 
Attorneys for Applicants 
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